{
  "cells": [
    {
      "cell_type": "markdown",
      "id": "a34a9588-01ff-41fb-8662-c81f33f9e029",
      "metadata": {
        "id": "a34a9588-01ff-41fb-8662-c81f33f9e029"
      },
      "source": [
        "# DeepSkies Toolbox \n",
        "\n",
        "DeepUtils is a suite of tools to aid along in cutting out the fluff of machine learning. \n",
        "It automates initial data analysis, processing, loading, and testing/evaluation metrics. \n",
        "For the sake of ease of use it is distributed via [pypi](), so it can be installed via pip. (Currently, installable via github, at this point)\n",
        "Once you have it installed, restart your kernel and load it in with an import statement.\n",
        "\n",
        "This tutorial will walk you through the main use and provide a little challenge problem to show some further application."
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "\"\"\"\n",
        "\"Beta\" additions: \n",
        "  - Add command line help function for detailing information about the configuration options.\n",
        "  - Make the task types more fine-grained. Ex, 'object-detection', 'mask', 'anomaly-detection'\n",
        "  - Module for more easily interacting with matplotlib.\n",
        "  - Separating diagnostics out as their own module.\n",
        "\"\"\""
      ],
      "metadata": {
        "id": "L2dtg16q93yk"
      },
      "id": "L2dtg16q93yk",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "\"\"\"\n",
        "TODO:\n",
        "  - Add the pythonic definitions as well, time permitting.\n",
        "\"\"\""
      ],
      "metadata": {
        "id": "wyi89GpBBrsp"
      },
      "id": "wyi89GpBBrsp",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "id": "4b9af9f8-104a-450a-acc3-cefe2bda665b",
      "metadata": {
        "id": "4b9af9f8-104a-450a-acc3-cefe2bda665b",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "e54a5afa-c701-4ff4-a0f9-93579cafdc42"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting git+https://****@github.com/AeRabelais/DeepUtilities-demo.git\n",
            "  Cloning https://****@github.com/AeRabelais/DeepUtilities-demo.git to /tmp/pip-req-build-il4w49t1\n",
            "  Running command git clone --filter=blob:none --quiet 'https://****@github.com/AeRabelais/DeepUtilities-demo.git' /tmp/pip-req-build-il4w49t1\n",
            "  Resolved https://****@github.com/AeRabelais/DeepUtilities-demo.git to commit bd1d84ea7d722b309041622460367e84809c7ac8\n",
            "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting pytest-cov<4.0.0,>=3.0.0\n",
            "  Downloading pytest_cov-3.0.0-py3-none-any.whl (20 kB)\n",
            "Requirement already satisfied: pytest>=4.6 in /usr/local/lib/python3.9/dist-packages (from pytest-cov<4.0.0,>=3.0.0->deeputilities==0.1.0) (7.2.2)\n",
            "Collecting coverage[toml]>=5.2.1\n",
            "  Downloading coverage-7.2.2-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (227 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m227.0/227.0 KB\u001b[0m \u001b[31m18.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: tomli in /usr/local/lib/python3.9/dist-packages (from coverage[toml]>=5.2.1->pytest-cov<4.0.0,>=3.0.0->deeputilities==0.1.0) (2.0.1)\n",
            "Requirement already satisfied: attrs>=19.2.0 in /usr/local/lib/python3.9/dist-packages (from pytest>=4.6->pytest-cov<4.0.0,>=3.0.0->deeputilities==0.1.0) (22.2.0)\n",
            "Requirement already satisfied: exceptiongroup>=1.0.0rc8 in /usr/local/lib/python3.9/dist-packages (from pytest>=4.6->pytest-cov<4.0.0,>=3.0.0->deeputilities==0.1.0) (1.1.1)\n",
            "Requirement already satisfied: packaging in /usr/local/lib/python3.9/dist-packages (from pytest>=4.6->pytest-cov<4.0.0,>=3.0.0->deeputilities==0.1.0) (23.0)\n",
            "Requirement already satisfied: iniconfig in /usr/local/lib/python3.9/dist-packages (from pytest>=4.6->pytest-cov<4.0.0,>=3.0.0->deeputilities==0.1.0) (2.0.0)\n",
            "Requirement already satisfied: pluggy<2.0,>=0.12 in /usr/local/lib/python3.9/dist-packages (from pytest>=4.6->pytest-cov<4.0.0,>=3.0.0->deeputilities==0.1.0) (1.0.0)\n",
            "Building wheels for collected packages: deeputilities\n",
            "  Building wheel for deeputilities (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for deeputilities: filename=deeputilities-0.1.0-py3-none-any.whl size=28077 sha256=48dc16c458e96cb20f0f94aa40f396a8524a6a8891e95ff3c49e8c49826892f5\n",
            "  Stored in directory: /tmp/pip-ephem-wheel-cache-coa9_bda/wheels/b5/ff/02/554584f23428391222d733fe07112ddf66ac916dab4274241e\n",
            "Successfully built deeputilities\n",
            "Installing collected packages: coverage, pytest-cov, deeputilities\n",
            "Successfully installed coverage-7.2.2 deeputilities-0.1.0 pytest-cov-3.0.0\n"
          ]
        }
      ],
      "source": [
        "!pip install git+https://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@github.com/AeRabelais/DeepUtilities-demo.git\n",
        " # REPLACE WITH ORGANIZATION TOKEN AND GITHUB DOWNLOAD"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "3fdec6a2-4e6f-457e-839a-2f17349bdae5",
      "metadata": {
        "id": "3fdec6a2-4e6f-457e-839a-2f17349bdae5"
      },
      "outputs": [],
      "source": [
        "# Uncomment below upon installation!\n",
        "# import deeputils "
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## DeepBench for Data Generation "
      ],
      "metadata": {
        "id": "-drS-oDp_TLC"
      },
      "id": "-drS-oDp_TLC"
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Generating your configuration YAML"
      ],
      "metadata": {
        "id": "Iw9cNQdP_enM"
      },
      "id": "Iw9cNQdP_enM"
    },
    {
      "cell_type": "code",
      "source": [
        "# Generate a default configuration yaml using the following command:"
      ],
      "metadata": {
        "id": "klKyU7W7_htn"
      },
      "id": "klKyU7W7_htn",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "#### Understanding your Configuration File's Content"
      ],
      "metadata": {
        "id": "eB5jdVuN_-XP"
      },
      "id": "eB5jdVuN_-XP"
    },
    {
      "cell_type": "markdown",
      "source": [
        "..."
      ],
      "metadata": {
        "id": "BRN0_pmBAKCw"
      },
      "id": "BRN0_pmBAKCw"
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Generating Random Data Automatically"
      ],
      "metadata": {
        "id": "jYi7zUAQ_bMs"
      },
      "id": "jYi7zUAQ_bMs"
    },
    {
      "cell_type": "code",
      "source": [
        "# Run the following command to generate data on the fly, using the default config:\n"
      ],
      "metadata": {
        "id": "V_fpt4zC_qYv"
      },
      "id": "V_fpt4zC_qYv",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Updating your Configuration YAML"
      ],
      "metadata": {
        "id": "4scSJgVa_x3p"
      },
      "id": "4scSJgVa_x3p"
    },
    {
      "cell_type": "code",
      "source": [
        "# Use the following command for updating the yaml.\n"
      ],
      "metadata": {
        "id": "mNzmdwon_1ak"
      },
      "id": "mNzmdwon_1ak",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Understanding the YAML Configuration File\n",
        "\n",
        "Little explainer about the yaml configuration file.\n"
      ],
      "metadata": {
        "id": "-eKrcBSzuw_w"
      },
      "id": "-eKrcBSzuw_w"
    },
    {
      "cell_type": "markdown",
      "source": [
        "### What's in the Configuration File?"
      ],
      "metadata": {
        "id": "ol4k3TD0Ol50"
      },
      "id": "ol4k3TD0Ol50"
    },
    {
      "cell_type": "markdown",
      "source": [
        "An example yaml configuration file would look as follows:\n",
        "\n",
        "```\n",
        "DATA:\n",
        "  CONFIGURATION: variable that gives info on how your dataset is configured.\n",
        "  DATA_PATH: path of the training data\n",
        "  LABEL_PATH: path of the labels for the training data\n",
        "MODEL_DATA:\n",
        "  MODEL: model type to use for training, if the user already has a preference.\n",
        "  PRE_MODEL: saved model files from another task, if applicable.\n",
        "  TASK_TYPE: The type of ML task being undertaken.\n",
        "MODEL_PARAMS:\n",
        "  BATCH_SIZE: The preferred initial batch size.\n",
        "  LEARNING_RATE: The preferred initial learning rate.\n",
        "  OPTIMIZER: The preferred initial optimizer.\n",
        "OUTPUT:\n",
        "  OUTPUT_PATH: The path where all processing, training, testing, and diagnostic data will be output.\n",
        "```\n",
        "\n",
        "Some of the YAML variables come with a few preset options. \n",
        "\n",
        "\n",
        "\n",
        "```\n",
        "CONFIGURATION: {\n",
        "    \"LABELLED_IMAGE_CSV\": images and accompanying csv of label information\n",
        "    \"LABELLED_IMAGE_FOLDERS\": images broken into folders containing their labels\n",
        "    \"LABELLED_NUMPY\": images held in training data and label numpy arrays\n",
        "    \"UNLABELLED\": no labels, just images\n",
        "    \"DEEPBENCH\": want to generate some random data from DeepBench\n",
        "    \"DEEPGOTDATA\": want to download data from DeepGotData\n",
        "    \"MNIST\": Load MNIST data\n",
        "    \"CIFAR\": Load CIFAR data\n",
        "    \"FMNIST\": Load FMNIST data\n",
        "}\n",
        "\n",
        "MODEL: Any of the primary models listed via PyTorch's Model Zoo: https://pytorch.org/vision/stable/models.html\n",
        "\n",
        "TASK_TYPE: [\"CLASSIFICATION\", \"REGRESSION\", \"FULL_NEURAL_NETWORK\"]\n",
        "\n",
        "\n",
        "    \n",
        "```\n",
        "\n"
      ],
      "metadata": {
        "id": "SaVGb4rt-lR8"
      },
      "id": "SaVGb4rt-lR8"
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Create or update config.yaml using the CLI"
      ],
      "metadata": {
        "id": "3ckPUCMJOqUx"
      },
      "id": "3ckPUCMJOqUx"
    },
    {
      "cell_type": "code",
      "source": [
        "# Create a default config yaml using the CLI\n",
        "!deeputils newyaml --config-file-path XX\n",
        "\n",
        "# new_yaml(config_file_path)"
      ],
      "metadata": {
        "id": "8aeDCF0fOz_U"
      },
      "id": "8aeDCF0fOz_U",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# Update the config yaml using the CLI\n",
        "!deeputils yaml --path XX --config_dict XX\n",
        "! deeputils yaml --path XX --config_dict "
      ],
      "metadata": {
        "id": "NHxgai2OP7uy"
      },
      "id": "NHxgai2OP7uy",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Full Run Command\n"
      ],
      "metadata": {
        "id": "L-X3TXQ1xTgp"
      },
      "id": "L-X3TXQ1xTgp"
    },
    {
      "cell_type": "markdown",
      "source": [
        "Assuming that you've correctly configured your files and data, you would be able to implement a full run of DeepUtils using the following command(s):"
      ],
      "metadata": {
        "id": "AZ-TcoVR7EY2"
      },
      "id": "AZ-TcoVR7EY2"
    },
    {
      "cell_type": "code",
      "source": [
        "# For this demo, a full run can be done as listed below.\n",
        "# Generally, a full run would be run with the command:\n",
        "# !deeputils full-run config-file-path XX\n",
        "\n",
        "!deeputils demo-run config-file-path XX "
      ],
      "metadata": {
        "id": "PIm-Ll3w7Cb5"
      },
      "id": "PIm-Ll3w7Cb5",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "id": "204ce846-b060-48a0-8103-fd5d2ed23b3d",
      "metadata": {
        "id": "204ce846-b060-48a0-8103-fd5d2ed23b3d"
      },
      "source": [
        "## Start: Ingesting Data\n",
        "\n",
        "Data for DeepUtils can be ingested using a few methods (this list will likely expand with more astrophysics-centered download options in the future), allowing someone to use example data from common benchmark datasets, randomly generated astrophysical data, or your own custom dataset."
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Let's use a demo'd version of the ingestion commands for the \"custom\" dataset input types.\n",
        "# This version will allow us to download some small example custom datasets.\n",
        "# Automatically updates your yaml file to input the data paths and data configuration types. ONLY FOR DEMO.\n",
        "\n",
        "!deeputils demo-ingest config-file-path \"./config/\"  \n",
        "\n",
        "# Usually, the command would be:\n",
        "# !deeputils ingest config-file-path \"./config/\"  "
      ],
      "metadata": {
        "id": "vI7XNSxvXN87"
      },
      "id": "vI7XNSxvXN87",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# If we wanted to use generated DeepBench data instead, you'd do the following:\n",
        "# !deeputils create-bench-config file-path \"./config/\" --config_dict {dictionary/values/of/bench/data}\n",
        "\n",
        "!deeputils create-bench-config config-file-path \"./config/\" --config_dict \"{num_objects: 50}\""
      ],
      "metadata": {
        "id": "T3Yd2M7MYdam"
      },
      "id": "T3Yd2M7MYdam",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# If we wanted to use generated DeepGotData data, we'd use the command:\n",
        "# data_path here is an optional, meaning it is not necessary. Data path will default to the cwd.\n",
        "\n",
        "#!deeputils download-got-data --project_id XX --data_path XX"
      ],
      "metadata": {
        "id": "4ltTuL_rZSAX"
      },
      "id": "4ltTuL_rZSAX",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "# We'll begin the pipeline by running the command for a demo of DeepUtils.\n",
        "\n",
        "!deeputils demo-ingest config-file-path \"./config/\""
      ],
      "metadata": {
        "id": "10XTyE_3Wo2-"
      },
      "id": "10XTyE_3Wo2-",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Preprocessing Data"
      ],
      "metadata": {
        "id": "uHhnyiezfBkM"
      },
      "id": "uHhnyiezfBkM"
    },
    {
      "cell_type": "markdown",
      "source": [
        "(Must run the last step if you don't have any data added!) Data preprocessing via DeepUtils automates the process of basic feature analysis using pandas-profiling, erasing missing data, replacing missing data (when applicable), running data interpolation (when applicable), removing outliers, or calculating feature importance.\n",
        "\n",
        "---\n",
        "\n"
      ],
      "metadata": {
        "id": "jvkrNlN17ula"
      },
      "id": "jvkrNlN17ula"
    },
    {
      "cell_type": "code",
      "source": [
        "# Running basic preprocessing is as simple as follows:\n",
        "!deeputils demo-preprocess config-file-path \"./config/\" --option/featuring/desired/preprocessing/method --erase_missing"
      ],
      "metadata": {
        "id": "CmmXMULu8k6Z"
      },
      "id": "CmmXMULu8k6Z",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Model Selection, Training, and Diagnosis"
      ],
      "metadata": {
        "id": "6NiuYJI982Vf"
      },
      "id": "6NiuYJI982Vf"
    },
    {
      "cell_type": "markdown",
      "source": [
        "The process of selecting an initial model and training configuration are done primarily in the config path, which means running the command line script as easy as running a short, one-line command.\n",
        "\n",
        "Diagnostics are not a separate command, and operate automatically when training/testing conclude."
      ],
      "metadata": {
        "id": "BWOT36dh-HdW"
      },
      "id": "BWOT36dh-HdW"
    },
    {
      "cell_type": "code",
      "source": [
        "# Use architecture-search if you want to implement an architecture search for your model\n",
        "\n",
        "!deeputils demo-modelling config-file-path \"./config/\" --architecture-search {TRUE/FALSE} --saved-model {/path/to/config/model}"
      ],
      "metadata": {
        "id": "In54sR_O9JME"
      },
      "id": "In54sR_O9JME",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "id": "59743c40-08f6-49ff-be21-f711f443f6bc",
      "metadata": {
        "id": "59743c40-08f6-49ff-be21-f711f443f6bc"
      },
      "source": [
        "## Assignement: Do it again\n"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "\n",
        "Now it's time to put this in action: with one critical difference. \n",
        "This time, use an image dataset. (Like mnist or a truncated cfar-10). \n",
        "\n",
        "* Load in your image dataset \n",
        "* Run an EDA \n",
        "* Preprocess, if you feel it necessary \n",
        "* Write a small convolusional net, and train it \n",
        "* Test how good your results are! "
      ],
      "metadata": {
        "id": "5X8CNBioEQlj"
      },
      "id": "5X8CNBioEQlj"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "id": "32d71b62-448f-41cc-9c5b-99db320129f8",
      "metadata": {
        "id": "32d71b62-448f-41cc-9c5b-99db320129f8"
      },
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.5"
    },
    "colab": {
      "provenance": []
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}